Multivariate statistics
Suppose that the pmf (\(f_{X,Y}\)) of joint distribution \(X\) and \(Y\) is given in the below table
| \(Y=0\) | \(Y=1\) | |
|---|---|---|
| \(X=0\) | \(\frac{2}{10}\) | \(\frac{3}{10}\) |
| \(X=1\) | \(\frac{2}{10}\) | \(\frac{3}{10}\) |
please show that X and Y are statistically independent
\[ \begin{align*} p_{X,Y}(0,0)=(\frac{2}{10}+\frac{3}{10})\times (\frac{2}{10}+\frac{2}{10})= 0.2 \\ p_{X,Y}(0,1)=(\frac{2}{10}+\frac{3}{10})\times (\frac{3}{10}+\frac{3}{10})= 0.3 \\ p_{X,Y}(1,0)=(\frac{2}{10}+\frac{3}{10})\times (\frac{2}{10}+\frac{2}{10})= 0.2 \\ p_{X,Y}(1,1)=(\frac{2}{10}+\frac{3}{10})\times (\frac{3}{10}+\frac{3}{10})= 0.3 \end{align*} \]
What is the pmf of \(f_Y\)
\[ p_Y(0)=\frac{4}{10}, p_Y(1)=\frac{6}{10} \]
For medical research on the effect of exposure to cancer, the research results are shown below:
| Cancer | No Cancer | |
|---|---|---|
| Exposure | a | b |
| Control | c | d |
\[ \begin{align*} & \frac{ad}{bc}=1 \\ \rightarrow & \frac{a}{b}=\frac{c}{d} \quad \text{Definition of statistical independent} \end{align*} \]
\[ \begin{align*} & \frac{\frac{a}{a+b}}{\frac{c}{c+d}}=1 \\ \rightarrow & \frac{a}{a+b}=\frac{c}{c+d} \\ \rightarrow & ac+ad=ac+ab \\ \rightarrow & \frac{a}{b}=\frac{c}{d} \quad \text{Definition of statistical independent} \end{align*} \]
Consider a random variable \(X\) uniformly distributed on \(\{-1, 0, 1\}\), and let \(Y = X^2\).
\[ \begin{aligned} \text{Cov}(X, Y) = & \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] \\ \because & \mathbb{E}(X) =0 \therefore \text{Cov}(X, Y) = \mathbb{E}[XY] - 0 \\ XY= &X\times X^2 =X^3, \text{ The values of $X^3$ are $\{-1, 0, 1\}$.} \\ \therefore & E[XY] = \frac{1}{3}(-1) + \frac{1}{3}(0) + \frac{1}{3}(1) = 0 \therefore \text{Cov}(X, Y)=0 \end{aligned} \]
No, they are dependent. If you know \(X=0\), you know for certain that \(Y=0\). If you know \(X=1\), you know \(Y=1\). One variable conveys perfect information about the other (specifically, \(Y\) is a deterministic function of \(X\)).
Marginal Density on a Unit Square: Let two continuous random variables \(X\) and \(Y\) have the joint probability density function (PDF):
\[ f_{X,Y}(x,y) = \begin{cases} x + y & \text{for } 0 \le x \le 1, 0 \le y \le 1 \\ 0 & \text{otherwise} \end{cases} \]
\[ f_X(x) = \int_{0}^{1} (x + y) dy = \left[ xy + \frac{y^2}{2} \right]_{y=0}^{y=1} = (x + 0.5) - 0 = x + 0.5 \]
\[ \begin{aligned} P(X > 0.5) = &\int_{0.5}^{1} (x + 0.5) dx = \left[ \frac{x^2}{2} + 0.5x \right]_{0.5}^{1} \\ = & (0.5 + 0.5) - (0.125 + 0.25) = 1 - 0.375 = 0.625 \end{aligned} \]
NO \[ \begin{aligned} f_X(x) \cdot f_Y(y) = & (x + 0.5)(y + 0.5) \\ = & xy + 0.5x + 0.5y + 0.25 \\ \neq & x+y \end{aligned} \]
Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a Tesla; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 2, which has a goat. He then says to you, “Do you want to pick door No. 3?” Is it to your advantage to switch your choice?
We define the event \(C_1\), \(C_2\), and \(C_3\) indicate car is at the door 1, 2, and 3, the event \(O_1\), \(O_2\), and \(O_3\) indicate host open the door 1, 2, and 3
\[ \begin{align*} P(C_1 | O_2,B_1) &= \frac{P(C_1 \& O_2 \& B_1)}{P(O2 \& B_1)} \\ &= \frac{\frac{1}{2}\times\frac{1}{3}\times\frac{1}{3}}{\frac{1}{2}\times\frac{1}{3}\times\frac{1}{3}+0\times\frac{1}{3}\times\frac{1}{3}+1\times\frac{1}{3}\times\frac{1}{3}} \end{align*} \]
after you choose a door, the host will open rest of the doors, and leave the last one unopened
We define the event \(C_1\), \(C_2\), and \(C_n\) indicate car is at the door 1, to n, the event \(O\) indicate host open the door 2 to \(n-1\) \[ \begin{align*} P(C_1 | O) &= \frac{P(C_1 \& O)}{P(O)} \\ &= \frac{\frac{1}{n-1}\times\frac{1}{n}}{\frac{1}{n-1}\times\frac{1}{n} + 0 +0 + \dots +1\times\frac{1}{n}} \\ & = \frac{1}{n} \end{align*} \]
so switching is a better strategy
A table is ruled with equidistant parallel lines a distance D apart. A needle of length L, where \(L\leq D\), is randomly thrown on the table. What is the probability that the needle will intersect one of the lines (the other probability being that the needle will be completely contained in the strip between two lines)?
What is the possible range of \(X\)? What will be the probability distribution of \(X\)?
\[ f(x,\theta)= \begin{cases} \frac{4}{D\pi} & \text{ while } 0 \leq x \leq \frac{D}{2}, 0 \leq \theta \leq \frac{\pi}{2} \\ 0 & \text{elsewhere} \end{cases} \]
\[ \begin{aligned} P(X < \frac{L}{2}\cos\theta) = & \int_{\theta=0}^{\frac{\pi}{2}}\int_{x=0}^{\frac{L}{2}\cos\theta}\frac{4}{D\pi}dxd\theta \\ = & \frac{4}{D\pi}\int_{\theta=0}^{\frac{\pi}{2}}\frac{L}{2}\cos\theta d\theta = \frac{4}{D\pi}\frac{L}{2}\sin\theta \big|^{\frac{\pi}{2}}_0 \\ = & \frac{2L}{D\pi} \end{aligned} \]
Note